Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernfs_memcg: Add helpers to gather memcgroup related data #96

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

imran-kn
Copy link
Contributor

@imran-kn imran-kn commented Aug 1, 2024

This as of now is just a dump of some of my bespoke debug scripts.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Aug 1, 2024
@imran-kn imran-kn changed the title DRAFT: kernfs_memcg: starting work. kernfs_memcg: Add helpers to gather memcgroup related data Oct 21, 2024
@imran-kn
Copy link
Contributor Author

This as of now is just a dump of some of my bespoke debug scripts.

I have added other helpers and modified the earlier ones, so that they work with other UEK versions as well

Copy link
Member

@biger410 biger410 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating memcg helpers. I added couple comments. And beside that, please create a bug for it and put the number in the git log.

drgn_tools/kernfs_memcg.py Outdated Show resolved Hide resolved
drgn_tools/kernfs_memcg.py Outdated Show resolved Hide resolved
drgn_tools/kernfs_memcg.py Outdated Show resolved Hide resolved
imran-kn added a commit that referenced this pull request Nov 25, 2024
@imran-kn
Copy link
Contributor Author

Thanks @biger410 for reviewing this. I have addressed your review comments. Could you please have a look and let me know if you have any further feedback

Add kernfs based helpers to extract memcg related information,
like number of active and inactive memcgroups, page cache pages
pinning memcgroups etc.

Orabug: 37322867
Signed-off-by: Imran Khan <[email protected]>
Orabug: 37322867
Signed-off-by: Imran Khan <[email protected]>
Orabug: 37322867
Signed-off-by: Imran Khan <[email protected]>
@biger410
Copy link
Member

The new changes look good to me. One more request is that can we add a corelen module for it? It should run either with -M option or -A option.

Add corelens module to dump pages that are pinning memcgroups.
By default upto 10K such pages are shown.

Orabug: 37322867
Signed-off-by: Imran Khan <[email protected]>
@imran-kn
Copy link
Contributor Author

The new changes look good to me. One more request is that can we add a corelen module for it? It should run either with -M option or -A option.

I have added corelens module.

python3 -m drgn_tools.corelens ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/VMCORE -d ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/

page: 0xffffc35c881b0000 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0040 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0080 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b00c0 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0100 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0140 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0180 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b01c0 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

@biger410
Copy link
Member

biger410 commented Nov 26, 2024

Is there a paste error? How could this trigger the corelen cmd?

python3 -m drgn_tools.corelens ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/VMCORE -d ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/

And with 10000 default, how could user run corelen cmd to dump all pages?

@imran-kn
Copy link
Contributor Author

imran-kn commented Nov 28, 2024

Is there a paste error? How could this trigger the corelen cmd?

python3 -m drgn_tools.corelens ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/VMCORE -d ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/

And with 10000 default, how could user run corelen cmd to dump all pages?

Yes, it was a copy paste error and missed the -M part . The actual command is:
python3 -m drgn_tools.corelens ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/VMCORE -d ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/ -M kernfs_memcg

and output is like the one shown below:

`imran@imran-metabox:~/oracle-samples-drgn-tools/drgn-tools$ python3 -m drgn_tools.corelens ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/VMCORE -d ~/Local-vmcores/Workqueue-study/UEK-6-rds-issue/ -M kernfs_memcg
page: 0xffffc35c881b0000 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0040 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0080 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b00c0 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0100 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0140 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b0180 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

page: 0xffffc35c881b01c0 cgroup: /user.slice/user-0.slice/session-7128.scope state: CSS_ONLINE|CSS_VISIBLE path: /datastore/oracle_230124_0745EST/diag/asm/cell/scaqat06celadm02/incpkg/pkg_76/seq_1/cell/IPSPKG_20230120060027_COM_1.zip

............................................................

Regarding default page count of 10K, I am using this value to make sure we don't end up spending a lot of time while collecting this data. We have seen that scanning all pages can take hours , so my idea here is that we get information of 10K pages and if that does not indicate anything conclusive , we can later scan all pages. Let me know if its sounds okay or using something other than 10K would be more acceptable
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants